2020-APRIL-27
Our regression workflow is built on two imperatives:
\[ y_i \sim Normal ( \mu_i, \sigma)\\ \sigma \sim Exponentional (1) \\ \mu_i \sim \alpha + \beta x_i \\ \boldsymbol{\eta} = \boldsymbol{\alpha} + \boldsymbol{X}\boldsymbol{\beta}\\ \]
\(g(\cdot)\) = link function $h() = g^-1() $ = inverse link function
Where \((\cdot)\) is the linear predictor: \(\mathboldsymbol{\eta}\)
The model requires a binomial link function:
\[Home_i \sim Binomial (Ownership_i, Pr)\]
\(logit(\cdot) = log_e\frac{p}{(1-p)}\)
\(logit^-1(\cdot) = log\frac{e^\cdot}{(1-e^\cdot)}\)
\[ \Pr(y_i = 1) = p_i \\ logit(p_i) = \alpha + \beta x_i \\ \Pr(y_i = 1) = logit^{-1}(\alpha + \beta x_i) \]
## Parameter | Log-Odds | SE | 95% CI | z | p ## ------------------------------------------------------------- ## (Intercept) | 1.46 | 0.05 | [1.36, 1.55] | 30.38 | < .001
\[ \log\left[ \frac { \widehat{P( \operatorname{HomeOwner} = \operatorname{1} )} }{ 1 - \widehat{P( \operatorname{HomeOwner} = \operatorname{1} )} } \right] = 1.46 \]
plogis## [1] 0.811068
## Parameter | Log-Odds | SE | 95% CI | z | p ## ----------------------------------------------------------------- ## (Intercept) | 1.58 | 0.05 | [1.48, 1.69] | 29.37 | < .001 ## Household.INC_s | 0.72 | 0.08 | [0.57, 0.89] | 8.86 | < .001
\[ \log\left[ \frac { \widehat{P( \operatorname{HomeOwner} = \operatorname{1} )} }{ 1 - \widehat{P( \operatorname{HomeOwner} = \operatorname{1} )} } \right] = 1.58 + 0.72(\operatorname{Household.INC\_s}) \]
## We fitted a logistic model (estimated using ML) to predict HomeOwner with Household.INC_s (formula: HomeOwner ~ Household.INC_s). The model's explanatory power is weak (Tjur's R2 = 0.04). The model's intercept, corresponding to Household.INC_s = 0, is at 1.58 (95% CI [1.48, 1.69], p < .001). Within this model: ## ## - The effect of Household.INC_s is significantly positive (beta = 0.72, 95% CI [0.57, 0.89], p < .001; Std. beta = 0.73, 95% CI [0.57, 0.89]) ## ## Standardized parameters were obtained by fitting the model on a standardized version of the dataset. 95% Confidence Intervals (CIs) and p-values were computed using
The range of incomes is: 0, 2.5^{6}.
## [1] 0.979438
## Parameter | Log-Odds | SE | 95% CI | z | p ## ----------------------------------------------------------------- ## (Intercept) | 1.60 | 0.05 | [1.50, 1.71] | 29.21 | < .001 ## Household.INC_s | 0.78 | 0.08 | [0.62, 0.95] | 9.27 | < .001
scale_x_continuous(limits = c(-1.2, 4))
This code allowed me to constrain both graphs to the same x-axis scale. If I left this out the graphs would have looked like this:
lm## Parameter | Coefficient | SE | 95% CI | t(2796) | p ## -------------------------------------------------------------------------- ## (Intercept) | 0.81 | 7.27e-03 | [0.80, 0.83] | 111.88 | < .001 ## Household.INC_s | 0.06 | 7.23e-03 | [0.04, 0.07] | 8.04 | < .001
## Parameter | Log-Odds | SE | 95% CI | z | p ## -------------------------------------------------------------------------------------------- ## (Intercept) | 2.13 | 0.10 | [ 1.94, 2.34] | 20.68 | < .001 ## GenCohortGen_Silent * born< 1946 | 0.26 | 0.28 | [-0.26, 0.85] | 0.94 | 0.347 ## GenCohortGenX * born >=1961 & b.< 1980 | -0.62 | 0.13 | [-0.87, -0.38] | -4.90 | < .001 ## GenCohortGenY * born >=1980 & b.< 1996 | -1.87 | 0.14 | [-2.15, -1.59] | -13.00 | < .001 ## GenCohortGenZ * born >= 1996 | -4.91 | 1.04 | [-7.80, -3.30] | -4.74 | < .001
## Parameter | Log-Odds | SE | 95% CI | z | p ## -------------------------------------------------------------------------------------------- ## (Intercept) | 2.60 | 0.12 | [ 2.37, 2.84] | 21.79 | < .001 ## GenCohortGen_Silent * born< 1946 | 0.70 | 0.30 | [ 0.15, 1.33] | 2.35 | 0.019 ## GenCohortGenX * born >=1961 & b.< 1980 | -0.99 | 0.14 | [-1.26, -0.73] | -7.31 | < .001 ## GenCohortGenY * born >=1980 & b.< 1996 | -2.40 | 0.16 | [-2.71, -2.09] | -14.94 | < .001 ## GenCohortGenZ * born >= 1996 | -4.99 | 1.06 | [-7.91, -3.32] | -4.72 | < .001 ## Household.INC_s | 1.19 | 0.10 | [ 0.99, 1.39] | 11.62 | < .001
## ## Gen Boombers: born >= 1946 & b.< 1961 Gen_Silent: born< 1946 ## 1024 208 ## GenX: born >=1961 & b.< 1980 GenY: born >=1980 & b.< 1996 ## 1253 416 ## GenZ: born >= 1996 ## 17
## # Comparison of Model Performance Indices ## ## Name | Model | AIC | BIC | Tjur's R2 | RMSE | Sigma | Log_loss | Score_log | PCP | Score_spherical ## ---------------------------------------------------------------------------------------------------------------- ## home2 | glm | 2590.320 | 2602.193 | 0.036 | 0.381 | 0.962 | 0.462 | -Inf | 0.708 | ## mg1 | glm | 2516.922 | 2546.675 | 0.099 | 0.372 | 0.941 | 0.442 | -Inf | 0.724 | 3.915e-04 ## mg2 | glm | 2272.307 | 2307.927 | 0.168 | 0.354 | 0.900 | 0.404 | -Inf | 0.748 | 5.159e-04
## # Accuracy of Model Predictions ## ## Accuracy: 66.39% ## SE: 2.72%-points ## Method: Area under Curve
## # Accuracy of Model Predictions ## ## Accuracy: 77.02% ## SE: 2.97%-points ## Method: Area under Curve
\[ y_i \sim Poisson(\lambda_i)\\ log(\lambda_i) = \alpha +\beta x_i\\ E(\lambda|y_i) = exp(\alpha +\beta x_i) \]
## Parameter | Log-Mean | SE | 95% CI | z | p ## ------------------------------------------------------------- ## (Intercept) | 1.01 | 0.11 | [0.79, 1.21] | 9.40 | < .001 ## x | 1.99 | 0.08 | [1.84, 2.14] | 26.24 | < .001
\[ \log ({ \widehat{E( \operatorname{y} )} }) = 1.01 + 1.99(\operatorname{x}) \]
# model
pois2 <- glm(y ~ x, data = fake) # remove "family = `poisson`)
# graph
p_pois2 <- plot(ggeffects::ggpredict(pois2, terms = "x"),
add.data = TRUE)
# render
p_pois2
\[ y_i \sim NegBinomial(\lambda_i,\phi)\\ log(\lambda_i) = \alpha +\beta x_i\\ \phi \sim gamma(.01,.01) \]
## # Overdispersion test ## ## dispersion ratio = 5.068 ## Pearson's Chi-Squared = 496.695 ## p-value = < 0.001
\[ y_i \sim ZIPoisson(p_i, \lambda_i)\\ logit(p_i) = \alpha_p + \beta_p x_i \\ log(\lambda) = \alpha_\lambda + \beta\lambda x_i \]
## [1] 0.6826594
## [1] 1.680429
There’s about 1.68 more dispersion that a poisson model would expect
## # Check for zero-inflation ## ## Observed zeros: 1992 ## Predicted zeros: 536 ## Ratio: 0.27
## # Overdispersion test ## ## dispersion ratio = 13.196 ## Pearson's Chi-Squared = 37595.179 ## p-value = < 0.001
brms package)## Family: zero_inflated_poisson ## Links: mu = log; zi = identity ## Formula: HoursCharity ~ Relid_s + Household.INC_s ## Data: nz (Number of observations: 2796) ## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; ## total post-warmup samples = 4000 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 1.70 0.02 1.67 1.73 1.00 3785 3068 ## Relid_s 0.00 0.01 -0.02 0.03 1.00 4501 3362 ## Household.INC_s -0.09 0.02 -0.12 -0.05 1.00 3704 3054 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## zi 0.70 0.01 0.68 0.72 1.00 3849 2989 ## ## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1).
## Family: zero_inflated_negbinomial ## Links: mu = log; shape = identity; zi = identity ## Formula: HoursCharity ~ Relid_s + Household.INC_s ## Data: nz (Number of observations: 2796) ## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; ## total post-warmup samples = 4000 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 0.89 0.17 0.52 1.17 1.00 949 885 ## Relid_s 0.21 0.06 0.11 0.33 1.00 1456 1698 ## Household.INC_s -0.07 0.05 -0.16 0.03 1.00 2061 2040 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## shape 0.28 0.07 0.16 0.42 1.00 938 914 ## zi 0.35 0.11 0.07 0.50 1.00 939 697 ## ## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1).
## elpd_diff se_diff ## b1 0.0 0.0 ## b0 -1514.8 207.5
We can translate the loo_compare output into a waic convention as:
## waic_diff se ## b1 0.000 0.0000 ## b0 3029.657 414.9648
## Family: zero_inflated_negbinomial ## Links: mu = log; shape = identity; zi = identity ## Formula: HoursCharity ~ Relid_s + Household.INC_s ## Data: nz (Number of observations: 2796) ## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; ## total post-warmup samples = 4000 ## ## Population-Level Effects: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## Intercept 0.89 0.17 0.52 1.17 1.00 949 885 ## Relid_s 0.21 0.06 0.11 0.33 1.00 1456 1698 ## Household.INC_s -0.07 0.05 -0.16 0.03 1.00 2061 2040 ## ## Family Specific Parameters: ## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS ## shape 0.28 0.07 0.16 0.42 1.00 938 914 ## zi 0.35 0.11 0.07 0.50 1.00 939 697 ## ## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS ## and Tail_ESS are effective sample size measures, and Rhat is the potential ## scale reduction factor on split chains (at convergence, Rhat = 1).
The probability of non-volunteering in the preferred model for people who are at the mean Religious Identification and mean Household income in this population is plogis(.35) or 0.5866176. More often than not, we should predict zeros in this population. What predicts the zero component of the model? We can use this syntax:
| HoursCharity | |||
|---|---|---|---|
| Predictors | Incidence Rate Ratios | CI (95%) | |
| Count Model | |||
| Intercept | 3.40 | 2.85 – 3.95 | |
| Relid_s | 1.02 | 0.94 – 1.11 | |
| Household.INC_s | 0.93 | 0.84 – 1.03 | |
| Zero-Inflated Model | |||
| Intercept | 1.06 | 0.72 – 1.38 | |
| Relid_s | 0.52 | 0.42 – 0.61 | |
| Household.INC_s | 1.02 | 0.87 – 1.17 | |
| Observations | 2796 | ||
| R2 Bayes | 0.015 | ||
## NULL
## elpd_diff se_diff ## b2 0.0 0.0 ## b1 -50.0 10.1 ## b0 -1564.8 206.1
## waic_diff se ## b2 0.0000 0.00000 ## b1 100.0416 20.14189 ## b0 3129.6983 412.13260
Zero-inflated negative binomial with no predictors for the zero-inflation part.
If you want to graph each predictor separately emply the [[1]] or [[2] syntax as follows:]
Predicted effects of religious identification
Gelman, Hill, and Vehtari (2020)
Kurz (2020)
Bürkner and Vuorre (2019)
Bürkner, Paul-Christian, and Matti Vuorre. 2019. “Ordinal Regression Models in Psychology: A Tutorial.” Advances in Methods and Practices in Psychological Science 2 (1): 77–101.
Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and Other Stories. Cambridge University Press.
Kurz, A. Solomon. 2020. Zenodo. https://doi.org/10.5281/zenodo.4080013.